Extraction of inference rules from heterogeneous graphs

ABSTRACT

According to an aspect, a heterogeneous graph in a data store is accessed. The heterogeneous graph includes a plurality of nodes having a plurality of node types. The nodes are connected by edges having a plurality of relation types. One or more intermediary graphs are created based on the heterogeneous graph. The intermediary graphs include intermediary nodes that are the relation types of the edges of the heterogeneous graph and include intermediary links between the intermediary nodes based on shared instances of the nodes between relation types in the heterogeneous graph. The intermediary graphs are traversed to find sets of relations based on intermediary links according to a template. An inference rule is extracted from the heterogeneous graph based on finding sets of relations in the intermediary graphs. The inference rule defines an inferred relation type between at least two of the nodes of the heterogeneous graph.

DOMESTIC PRIORITY

This application is a continuation of U.S. application Ser. No.14/485,942 filed Sep. 15, 2014, the disclosure of which is incorporatedby reference herein in its entirety.

BACKGROUND

The present disclosure relates generally to inference rule extraction,and more specifically, to extraction of inference rules fromheterogeneous graphs.

Information extracted from literature can be summarized either manuallyor automatically in networks or graphs that define relations betweennodes representing various elements. A heterogeneous graph may includeseveral node types and many relation types defined between nodes of theheterogeneous graph. Human users may examine the contents of aheterogeneous graph and attempt to extract knowledge by looking forpatterns in relationships between various node types and relation types.However, looking at a heterogeneous graph in a visual interface to inferrules from the heterogeneous graph can be challenging where semanticmeaning of relations is not available. Additionally, in a very largegraph that includes millions of nodes and edges that define relationsbetween the nodes, it is impractical for a human to extract allinferable rules from the graph.

SUMMARY

Embodiments include a method for inference rule extraction from aheterogeneous graph. The method includes accessing a heterogeneous graphin a data store. The heterogeneous graph includes a plurality of nodeshaving a plurality of node types. The nodes are connected by edgeshaving a plurality of relation types. One or more intermediary graphsare created based on the heterogeneous graph. The one or moreintermediary graphs include intermediary nodes that are the relationtypes of the edges of the heterogeneous graph and further includeintermediary links between the intermediary nodes based on sharedinstances of the nodes between the relation types in the heterogeneousgraph. The one or more intermediary graphs are traversed to find sets ofrelations based on the intermediary links according to a template. Aninference rule is extracted from the heterogeneous graph based onfinding the sets of relations in the one or more intermediary graphs.The inference rule defines an inferred relation type between at leasttwo of the nodes of the heterogeneous graph.

Additional features and advantages are realized through the techniquesof the present disclosure. Other embodiments and aspects of thedisclosure are described in detail herein. For a better understanding ofthe disclosure with the advantages and the features, refer to thedescription and to the drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The subject matter which is regarded as the invention is particularlypointed out and distinctly claimed in the claims at the conclusion ofthe specification. The forgoing and other features, and advantages ofthe invention are apparent from the following detailed description takenin conjunction with the accompanying drawings in which:

FIG. 1 depicts a block diagram of a system for inference rule extractionin accordance with an embodiment;

FIG. 2 depicts an example of a heterogeneous graph in accordance with anembodiment;

FIG. 3A depicts a source intermediary graph as an intermediary graph inaccordance with an embodiment;

FIG. 3B depicts a target intermediary graph as an intermediary graph inaccordance with an embodiment;

FIG. 3C depicts a target-source intermediary graph as an intermediarygraph in accordance with an embodiment;

FIG. 4 depicts a process flow for inference rule extraction from aheterogeneous graph in accordance with an embodiment;

FIG. 5 depicts a high-level block diagram of a question-answer (QA)framework where embodiments of inference rule extraction can beimplemented in accordance with an embodiment; and

FIG. 6 depicts a processing system in accordance with an embodiment.

DETAILED DESCRIPTION

Embodiments disclosed herein relate to inference rule extraction from aheterogeneous graph. As used herein, the term “semantic relations”refers to relationships between concepts or meanings. Examples relatedto the medical field are described herein; however, embodiments are notlimited to applications in the medical field. Embodiments can beutilized by any application that uses a heterogeneous graph from whichinference rules can be extracted to support data analysis and knowledgeextraction, including, but not limited to: troubleshooting and repair(e.g., to facilitate diagnostic analysis of a system or component) and ageneral question-answer (QA) system.

As one example, in the medical domain, a vast number of knowledgesources and ontologies exist. Such information is also growing andchanging extremely quickly, making the information difficult for peopleto read, process, and remember. The combination of recent developmentsin information extraction and the availability of unparalleled medicalresources thus offer an opportunity to develop new techniques to helphealthcare professionals overcome the cognitive challenges they may facein clinical decision making. The medical domain has a vast amount ofliterature found in textbooks, encyclopedias, guidelines, electronicmedical records, and many other sources. The amount of data is alsogrowing at an extremely high speed. Substantial understanding of themedical domain has already been included in the Unified Medical LanguageSystem® (UMLS) knowledge base (KB), which includes medical concepts,relations, and definitions. The UMLS KB is a compendium of manycontrolled vocabularies in the biomedical sciences and may be viewed asa comprehensive thesaurus and ontology of biomedical concepts. Itprovides a mapping structure among these vocabularies and thus allowstranslation among the various terminology systems. The 2013 version ofthe UMLS KB contains information about more than 3 million concepts fromover 160 source vocabularies.

FIG. 1 depicts a block diagram of a system 100 for inference ruleextraction in accordance with an embodiment. One or more instance of aheterogeneous graph 102 can be constructed based on literature 104 by anautomated or manual process and stored in a data store 103. The datastore 103 can be a memory device or subsystem, such as a computer memorysystem, and may be distributed between multiple physical locations orstored at a single location. The literature 104 can include a body ofdocuments, journals, manuals, studies, and the like which describeinformation. Natural language processing and semantic relationextraction can be used to convert information in the literature 104 intothe heterogeneous graph 102. Annotation 106 can be performed on theheterogeneous graph 102 to add or modify semantic relations.Alternatively, Annotation 106 can be used to manually create aheterogeneous graph without relying on automatic techniques. The UMLS KBis one example of a manually constructed heterogeneous graph thatincludes several million nodes, such as diseases, treatments, andsymptoms, as well as hundreds of semantic relation types defined betweennodes. As the size of the heterogeneous graph 102 grows, it may be toounwieldy to manually inspect thousands or millions of concepts capturedin the heterogeneous graph 102 to discover and infer rules capturedtherein.

As one example, if a brute force approach is taken manually or by acomputer implemented process to discover all rules pertaining to onenode, where a total number of n nodes exist in a graph and theout-degree of each node is T (i.e., number of relation types), overallcomplexity of the inspection process for each node would be O(n²*T³) inbig-O notation, i.e., order of the growth rate of the function. This isbecause for each node, rules can be mined that pertain to two of (n−1)other nodes. Selection can be done in (n−1)C(2) ways, where C is a nodeabout which rules are sought. To traverse all the directional rulesbetween three nodes, complexity is O(T³), and hence the complexity forall n nodes is O(n³*T³). Exemplary embodiments improve computing systemfunctionality by reducing computational complexity to O(n²+T³) to inferall rules. Larger graph sizes would see larger degrees of computationalimprovement, thus improving computer system functionality by reducingrequired time to extract all rules that can be inferred from aheterogeneous graph and increasing processing resource availability forother tasks.

In the example of FIG. 1, inference rule extraction 108 accesses one ormore templates 110 to discover a rule pattern for a rule to be inferredand extracted from the heterogeneous graph 102. An example of a rulepattern in the templates 110 is: “If A relation_x B and B relation_y Cthen A relation_z C”, where A, B, and C are node types and relation_x,relation_y, and relation_z are relation types. An inference rule candefine an inferred relation type between at least two of the nodes ofthe heterogeneous graph 102. The inference rule extraction 108 canaccess the heterogeneous graph 102 in the data store 103. To infer sucha rule for particular instances of nodes and relations within theheterogeneous graph 102, the inference rule extraction 108 creates oneor more intermediary graphs 112. The intermediary graphs 112 includeintermediary nodes that are relation types of the edges of theheterogeneous graph 102. The intermediary graphs 112 also includeintermediary links between the intermediary nodes based on sharedinstances of nodes between the relation types in the heterogeneous graph102. The intermediary graphs 112 may be traversed to find sets ofrelations based on the intermediary links according to a rule pattern ofthe templates 110. An inference rule can be extracted from theheterogeneous graph 102 and stored in extracted inference rules 114based on finding the sets of relations in the intermediary graphs 112.

As one example, the intermediary graphs 112 can include a sourceintermediary graph having intermediary nodes connected with undirectedlinks as intermediary links. The undirected links may be based on therelation types of the intermediary nodes sharing a common source node inthe heterogeneous graph 102. The intermediary graphs 112 can alsoinclude a target intermediary graph having the intermediary nodesconnected with undirected links as the intermediary links, where theundirected links are based on the relation types of the intermediarynodes sharing a common target node in the heterogeneous graph 102. Theintermediary graphs 112 may also include a target-source intermediarygraph having the intermediary nodes connected with directed links as theintermediary links. The directed links can be based on the relationtypes of the intermediary nodes having a source node that is a targetnode of another relation type in the heterogeneous graph 102.

FIG. 2 depicts an example of a heterogeneous graph 200 in accordancewith an embodiment. The heterogeneous graph 200 is an example of aportion of the heterogeneous graph 102 of FIG. 1. The heterogeneousgraph 200 includes multiple groups 202A, 202B that have commonrelations, as well as other groups (not depicted). Group 202A includes amedicine node 204A that has a value of “magnesium sulphate”, a symptomnode 206A that has a value of “pain”, and a disease node 208A that has avalue of “neuralgia”, where the medicine node 204A, symptom node 206A,and disease node 208A are examples of different node types. Group 202Aalso includes a number of relations defined between the node types. Inthe example of FIG. 2, from medicine node 204A to symptom node 206A, amay_prevent relation 210 is defined as an edge. From symptom node 206Ato disease node 208A, a definitional_manifestion_of relation 212 isdefined as an edge. A may_treat relation 214 is defined as an edgebetween the medicine node 204A and disease node 208A. The may_preventrelation 210, definitional_manifestion_of relation 212, and may_treatrelation 214 are examples of different relation types that are edges inthe heterogeneous graph 200.

The group 202B includes a medicine node 204B that has a value of“capsaicin”, a symptom node 206B that has a value of “seizures”, and adisease node 208B that has a value of “eclampsia”, where the medicinenode 204B, symptom node 206B, and disease node 208B are examples ofdifferent node types. Group 202B also includes a number of relationsdefined as edges between the node types. In the example of FIG. 2, frommedicine node 204B to symptom node 206B, a may_prevent relation 210relation is defined as an edge. From symptom node 206B to disease node208B, a definitional_manifestion_of relation 212 is defined as an edge.In an exemplary embodiment, a relation between medicine node 204B anddisease node 208B may not be defined but can be inferred as a may_treatrelation 214 as further described herein.

Upon accessing the heterogeneous graph 200 of FIG. 2, the inference ruleextraction 108 of FIG. 1 can create the intermediary graphs 112 of FIG.1 including a source intermediary graph 300A of FIG. 3A, a targetintermediary graph 300B of FIG. 3B, and a target-source intermediarygraph 300C of FIG. 3C.

The source intermediary graph 300A of FIG. 3A includes an intermediarynode 302A that has a value of “definitional_manifestation_of” as arelation type of the edge: definitional_manifestion_of relation 212 ofFIG. 2. The source intermediary graph 300A also includes an intermediarynode 304A that has a value of “may_prevent” as a relation type of theedge: may_prevent relation 210 of FIG. 2. The source intermediary graph300A further includes an intermediary node 306A that has a value of“may_treat” as a relation type of the edge: may_treat relation 214 ofFIG. 2. The source intermediary graph 300A can connect intermediarynodes 304A and 306A with an undirected link 308 as an intermediary link.The intermediary nodes 304A and 306A are relation types that share acommon source node in the heterogeneous graph 200 of FIG. 2. Forexample, medicine node 204A is a common source node of the may_preventrelation 210 and the may_treat relation 214 of FIG. 2.

In general terms, the source intermediary graph 300A is defined asfollows: an intermediary link exists between two intermediary nodes(e.g., relation_x and relation_y) if the source nodes for thoserelations are sufficiently similar. As an example, a set of source nodes(S) of relation_x and a set of source nodes (S) of relation_y can beconsidered sufficiently similar if the Jaccard value (J) betweenS(relation_x) and S(relation_y) is non-zero, i.e., J(S(relation_x),S(relation_y))>0. A Jaccard value measures the similarity between twosets and is defined as the size of the intersection of the sets dividedby the size of the union of the sets.

FIG. 3B depicts the target intermediary graph 300B as one of theintermediary graphs 112 of FIG. 1 in accordance with an embodiment. Thetarget intermediary graph 300B includes an intermediary node 302B thathas a value of “definitional_manifestation_of” as a relation type of theedge: definitional_manifestion_of relation 212 of FIG. 2. The targetintermediary graph 300B also includes an intermediary node 304B that hasa value of “may_prevent” as a relation type of the edge: may_preventrelation 210 of FIG. 2. The target intermediary graph 300B furtherincludes an intermediary node 306B that has a value of “may_treat” as arelation type of the edge: may_treat relation 214 of FIG. 2. The targetintermediary graph 300B can connect intermediary nodes 302B and 306Bwith an undirected link 310 as an intermediary link. The intermediarynodes 302B and 306B are relation types that share a common target nodein the heterogeneous graph 200 of FIG. 2. For example, disease node 208Ais a common target node of the definitional_manifestion_of relation 212and the may_treat relation 214 of FIG. 2.

In general terms, the target intermediary graph 300B is defined asfollows: an intermediary link exists between two intermediary nodes(e.g., relation_x and relation_y) if the sets of target nodes for thetwo relations are sufficiently similar. Again, a set of target nodes (T)of relation_x and a set of target nodes (T) of relation_y can beconsidered sufficiently similar if the Jaccard value (J) betweenT(relation_x) and T(relation_y) is non-zero, i.e., J(T(relation_x),T(relation_y))>0.

FIG. 3C depicts the target-source intermediary graph 300C as one of theintermediary graphs 112 of FIG. 1 in accordance with an embodiment. Thetarget-source intermediary graph 300C includes an intermediary node 302Cthat has a value of “definitional_manifestation_of” as a relation typeof the edge: definitional_manifestion_of relation 212 of FIG. 2. Thetarget-source intermediary graph 300C also includes an intermediary node304C that has a value of “may_prevent” as a relation type of the edge:may_prevent relation 210 of FIG. 2. The target-source intermediary graph300C further includes an intermediary node 306C that has a value of“may_treat” as a relation type of the edge: may_treat relation 214 ofFIG. 2. The target-source intermediary graph 300C can connectintermediary nodes 302C and 304C with a directed link 312 as anintermediary link from intermediary node 302C to intermediary node 304C.The intermediary nodes 302C and 304C are relation types such that thesource node of one is the target node of the other in the heterogeneousgraph 200 of FIG. 2. For example, symptom node 206A is a source node ofthe definitional_manifestion_of relation 212 and is a target node withrespect to the may_prevent relation 210 of FIG. 2.

In general terms, the target-source intermediary graph 300C is definedas follows: an intermediary link exists between two intermediary nodes(e.g., relation_x and relation_y) if the set of target nodes (T) ofrelation_x and the set of source nodes(S) of relation_y are sufficientlysimilar. The Jaccard value (J) between T(relation_x) and S(relation_y)can be used to measure similarity. In this graph, unlike the sourceintermediary graph 300A and the target intermediary graph 300B of FIGS.3A and 3B, edges are directional, pointing from relation_x torelation_y.

Using the combination of the source intermediary graph 300A, the targetintermediary graph 300B, and the target-source intermediary graph 300C,one or more of the extracted inference rules 114 can be extracted. Asone example, for each edge, (r_(—)1, r_(—)2), in the target-sourceintermediary graph 300C, a set of relations R_(—)3_(—)1 can be foundsuch that (r_(—)1, r_(—)3_(—)1) exists in the source intermediary graph300A and a set of relations R_(—)3_(—)2 can be found such that (r_(—)2,r_(—)3_(—)2) exists in the target intermediary graph 300B. A set ofrelations R_(—)3 equals R_(—)3_(—)1 intersected with R_(—)3_(—)2. Thisresults in mining inference rules matching a rule pattern “if A_r_(—)1 Band B_r_(—)2 C then A_r_(—)3 C” from the templates 110 of FIG. 1.

As a generalized example, consider an inference rule (a--r_(—)1--b,b--r_(—)2--c, a--r_(—)3--c). Set S denotes a set of source nodes for arelation type, and set T denotes a set of target nodes for a relationtype. Since this inference rule exists, T(r_(—)1)∩S(r_(—)2) is non-empty(i.e., it has element b). S(r_(—)1)∩S(r_(—)3) is non-empty (i.e., it haselement a). T(r_(—)2)∩T(r_(—)3) is non-empty (i.e., it has element c).Thus, an inference rule can be found.

By applying the inference rule extraction 108 of FIG. 1 to group 202A ofFIG. 2, the intermediary graphs 112 of FIG. 1 can include the sourceintermediary graph 300A of FIG. 3A, the target intermediary graph 300Bof FIG. 3B, and the target-source intermediary graph 300C of FIG. 3C.Where the templates 110 of FIG. 1 include the rule pattern “if A r_(—)1B and B r_(—)2 C then A_r_(—)3 C”, the extracted inference rules 114 ofFIG. 1 for group 202A of FIG. 2 can include “if magnesium sulphate mayprevent pain and pain is a definitional manifestation of neuralgia thenmagnesium sulphate may treat neuralgia”. This can be generalized to aninferred rule that if a medicine node has a may_prevent relation to asymptom node and the symptom node has a definitional_manifestation_ofrelation to a disease node, then the medicine node should also have amay_treat relation to the disease node. If may_treat relation 214 ismissing or not labeled for group 202B of FIG. 2, a may_treat relation214 can be inferred between medicine node 204B (FIG. 2) and disease node208B (FIG. 2) based on the inferred rule extracted from group 202A ofFIG. 2. Thus, it can be inferred that “if capsaicin may prevent seizuresand seizures are a definitional manifestation of eclampsia thencapsaicin may treat eclampsia”.

FIG. 4 depicts a process flow 400 for inference rule extraction from aheterogeneous graph in accordance with an embodiment. The process flow400 provides an example of a method for inference rule extraction. Forpurposes of explanation, the process flow 400 is described in terms ofthe examples of FIGS. 1-3C but can be implemented on various systemconfigurations, including heterogeneous graphs with millions of nodesresulting in millions of extracted inference rules.

At block 402, a heterogeneous graph in a data store is accessed, such asthe heterogeneous graph 102 in data store 103 of FIG. 1. Theheterogeneous graph can include a plurality of nodes having a pluralityof node types. The nodes are connected by edges having a plurality ofrelation types, as in the example of FIG. 2.

At block 404, one or more intermediary graphs are created based on theheterogeneous graph, such as intermediary graphs 112 of FIG. 1 andintermediary graphs 300A-300C of FIGS. 3A-3C. The one or moreintermediary graphs include intermediary nodes that are the relationtypes of the edges of the heterogeneous graph. The one or moreintermediary graphs also include intermediary links between theintermediary nodes based on shared instances of the nodes between therelation types in the heterogeneous graph. A source intermediary graph,such as source intermediary graph 300A of FIG. 3, can be created as oneof the intermediary graphs by connecting intermediary nodes withundirected links as the intermediary links, where the undirected linksare based on the relation types of the intermediary nodes sharing acommon source node in the heterogeneous graph. A target intermediarygraph, such as target intermediary graph 300B of FIG. 3B, can be createdas one of the intermediary graphs by connecting intermediary nodes withundirected links as the intermediary links, where the undirected linksare based on the relation types of the intermediary nodes sharing acommon target node in the heterogeneous graph. A target-sourceintermediary graph, such as target-source intermediary graph 300C ofFIG. 3, can be created as one of the intermediary graphs by connectingintermediary nodes with directed links as the intermediary links, wherethe directed links are based on the relation types of the intermediarynodes having a source node that is a target node of another relationtype in the heterogeneous graph.

At block 406, the one or more intermediary graphs are traversed to findsets of relations based on the intermediary links according to atemplate. As an example, for each intermediary link between a firstrelation type and a second relation type in the target-sourceintermediary graph, the source intermediary graph can be examined tofind a first set of relations in the source intermediary graphassociated with the first relation type. The target intermediary graphcan be examined to find a second set of relations in the targetintermediary graph associated with the second relation type. Anintersection between the first set of relations and the second set ofrelations can be determined.

At block 408, an inference rule is extracted from the heterogeneousgraph based on finding the sets of relations in the one or moreintermediary graphs. The inference rule defines an inferred relationtype between at least two of the nodes of the heterogeneous graph.Inference rules can be stored in the extracted inference rules 114 ofFIG. 1 for use by other processes that may apply generalized versions ofthe extracted inference rules 114 to identify missing relations, createhigher level inferences, or perform other types of rule-based analysis.The heterogeneous graph can be traversed to extract all inference rulesthat are inferable from the heterogeneous graph according to thetemplate, which may be one of the templates 110 of FIG. 1. A templatemay define a rule pattern, such as: three node types having threerelation types between the three node types.

Turning now to FIG. 5, a high-level block diagram of a question-answer(QA) framework 500 where embodiments described herein can be utilized isgenerally shown.

The QA framework 500 can be implemented to generate a ranked list ofanswers 504 (and a confidence level associated with each answer) to agiven question 502. In an embodiment, general principles implemented bythe framework 500 to generate answers 504 to questions 502 includemassive parallelism, the use of many experts, pervasive confidenceestimation, and the integration of shallow and deep knowledge. In anembodiment, the QA framework 500 shown in FIG. 5 is implemented by theWatson™ product from IBM.

The QA framework 500 shown in FIG. 5 defines various stages of analysisin a processing pipeline. In an embodiment, each stage admits multipleimplementations that can produce alternative results. At each stage,alternatives can be independently pursued as part of a massivelyparallel computation. Embodiments of the framework 500 don't assume thatany component perfectly understands the question 502 and can just lookup the right answer 504 in a database. Rather, many candidate answerscan be proposed by searching many different resources, on the basis ofdifferent interpretations of the question (e.g., based on a category ofthe question.) A commitment to any one answer is deferred while more andmore evidence is gathered and analyzed for each answer and eachalternative path through the system.

As shown in FIG. 5, the question and topic analysis 510 is performed andused in question decomposition 512. Hypotheses are generated by thehypothesis generation block 514 which uses input from the questiondecomposition 512, as well as data obtained via a primary search 516through the answer sources 506 and candidate answer generation 518 togenerate several hypotheses. Hypothesis and evidence scoring 526 is thenperformed for each hypothesis using evidence sources 508 and can includeanswer scoring 520, evidence retrieval 522 and deep evidence scoring524.

A synthesis 528 is performed of the results of the multiple hypothesisand evidence scorings 526. Input to the synthesis 528 can include answerscoring 520, evidence retrieval 522, and deep evidence scoring 524.Learned models 530 can then be applied to the results of the synthesis528 to generate a final confidence merging and ranking 532. A rankedlist of answers 504 (and a confidence level associated with each answer)is then output.

Relation extraction plays a key role in information extraction in the QAframework 500 shown in FIG. 5. Embodiments of the inference ruleextraction herein can be utilized by the QA framework 500 to improverelation extraction. Embodiments can be utilized, for example, incandidate answer generation 518, where extracted inference rules fromthe answer sources 506 can be used for potential candidate answergeneration. Also, in evidence retrieval 522 and deep evidence scoring524, extracted inference rules from the evidence sources 508 can beutilized to detect implicit relations across the question and passages.

The framework 500 shown in FIG. 5 can utilize embodiments of theinference rule extraction described herein to create learned models 530by training statistical machine learning algorithms on prior sets ofquestions and answers to learn how best to weight each of the hundredsof features relative to one another. These weights can be used at runtime to balance all of the features when combining the final scores forcandidate answers to new questions 502. In addition, embodiments can beused to generate a KB based on a corpus of data that replaces orsupplements commercially available KBs.

Referring now to FIG. 6, there is shown an embodiment of a processingsystem 600 for implementing the teachings herein. In this embodiment,the processing system 600 has one or more central processing units(processors) 601 a, 601 b, 601 c, etc. (collectively or genericallyreferred to as processor(s) 601). Processors 601, also referred to asprocessing circuits, are coupled to system memory 614 and various othercomponents via a system bus 613. Read only memory (ROM) 602 is coupledto system bus 613 and may include a basic input/output system (BIOS),which controls certain basic functions of the processing system 600. Thesystem memory 614 can include ROM 602 and random access memory (RAM)610, which is read-write memory coupled to system bus 613 for use byprocessors 601.

FIG. 6 further depicts an input/output (I/O) adapter 607 and a networkadapter 606 coupled to the system bus 613. I/O adapter 607 may be asmall computer system interface (SCSI) adapter that communicates with ahard disk 603 and/or tape storage drive 605 or any other similarcomponent. I/O adapter 607, hard disk 603, and tape storage drive 605are collectively referred to herein as mass storage 604. Software 620for execution on processing system 600 may be stored in mass storage604. The mass storage 604 is an example of a tangible storage mediumreadable by the processors 601, where the software 620 is stored asinstructions for execution by the processors 601 to perform a method,such as the process flow 400 of FIG. 4. Network adapter 606interconnects system bus 613 with an outside network 616 enablingprocessing system 600 to communicate with other such systems. A screen(e.g., a display monitor) 615 is connected to system bus 613 by displayadapter 612, which may include a graphics controller to improve theperformance of graphics intensive applications and a video controller.In one embodiment, adapters 607, 606, and 612 may be connected to one ormore I/O buses that are connected to system bus 613 via an intermediatebus bridge (not shown). Suitable I/O buses for connecting peripheraldevices such as hard disk controllers, network adapters, and graphicsadapters typically include common protocols, such as the PeripheralComponent Interconnect (PCI). Additional input/output devices are shownas connected to system bus 613 via user interface adapter 608 anddisplay adapter 612. A keyboard 609, mouse 640, and speaker 611 can beinterconnected to system bus 613 via user interface adapter 608, whichmay include, for example, a Super I/O chip integrating multiple deviceadapters into a single integrated circuit.

Thus, as configured in FIG. 6, processing system 600 includes processingcapability in the form of processors 601, and, storage capabilityincluding system memory 614 and mass storage 604, input means such askeyboard 609 and mouse 640, and output capability including speaker 611and display 615. In one embodiment, a portion of system memory 614 andmass storage 604 collectively store an operating system such as the AIX®operating system from IBM Corporation to coordinate the functions of thevarious components shown in FIG. 6.

Technical effects and benefits include inference rule extraction from aheterogeneous graph using intermediary graphs to increase processingefficiency and reduce latency.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention. The computer readable storage medium can be atangible device that can retain and store instructions for use by aninstruction execution device.

The computer readable storage medium may be, for example, but is notlimited to, an electronic storage device, a magnetic storage device, anoptical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of onemore other features, integers, steps, operations, element components,and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

What is claimed is:
 1. A method comprising: accessing a heterogeneousgraph in a data store, the heterogeneous graph comprising a plurality ofnodes having a plurality of node types, the nodes connected by edgeshaving a plurality of relation types; creating one or more intermediarygraphs based on the heterogeneous graph, the one or more intermediarygraphs comprising intermediary nodes that are the relation types of theedges of the heterogeneous graph and further comprising intermediarylinks between the intermediary nodes based on shared instances of thenodes between the relation types in the heterogeneous graph; traversingthe one or more intermediary graphs to find sets of relations based onthe intermediary links according to a template; and extracting inferencerules from the heterogeneous graph based on finding the sets ofrelations in the one or more intermediary graphs, each of the inferencerules defining an inferred relation type between at least two of thenodes of the heterogeneous graph.
 2. The method of claim 1, whereincreating one or more intermediary graphs further comprises: creating asource intermediary graph comprising the intermediary nodes connectedwith undirected links as the intermediary links, the undirected linksbased on the relation types of the intermediary nodes sharing a commonsource node in the heterogeneous graph.
 3. The method of claim 2,wherein creating one or more intermediary graphs further comprises:creating a target intermediary graph comprising the intermediary nodesconnected with undirected links as the intermediary links, theundirected links based on the relation types of the intermediary nodessharing a common target node in the heterogeneous graph.
 4. The methodof claim 3, wherein creating one or more intermediary graphs furthercomprises: creating a target-source intermediary graph comprising theintermediary nodes connected with directed links as the intermediarylinks, the directed links based on the relation types of theintermediary nodes having a source node that is a target node of anotherrelation type in the heterogeneous graph.
 5. The method of claim 4,further comprising for each intermediary link between a first relationtype and a second relation type in the target-source intermediary graph:finding a first set of relations in the source intermediary graphassociated with the first relation type; finding a second set ofrelations in the target intermediary graph associated with the secondrelation type; and determining an intersection between the first set ofrelations and the second set of relations.
 6. The method of claim 1,further comprising: traversing the heterogeneous graph to extract all ofthe inference rules that are inferable from the heterogeneous graphaccording to the template.
 7. The method of claim 6, wherein thetemplate defines a rule pattern as three node types having threerelation types between the three node types.
 8. The method of claim 7,wherein a computational complexity to extract all of the inference rulesfrom the heterogeneous graph according to the template comprising therule pattern is of order (n²+T³) complexity, where n is the number ofnodes in the heterogeneous graph and T is the number of relation typesin the heterogeneous graph.